-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34819][SQL] MapType supports orderable semantics #31967
Conversation
@hvanhovell @cloud-fan @maropu Could you please help review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a re-proposal for https://issues.apache.org/jira/browse/SPARK-18134 ? I thought the decision made on that JIRA is we won't support this. But we also encountered this when migrating from Hive to Spark. We worked around this by adding a logical plan rule to covert map to sorted array if needed.
@c21 we should support this. I just ran out of time when I was working on it. |
Ok to test |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136559 has finished for PR 31967 at commit
|
Test build #136556 has finished for PR 31967 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
f60a5c4
to
58bb3cc
Compare
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136614 has finished for PR 31967 at commit
|
@@ -687,6 +688,118 @@ class CodegenContext extends Logging { | |||
} | |||
""" | |||
s"${addNewFunction(compareFunc, funcCode)}($c1, $c2)" | |||
case _ @ MapType(keyType, valueType, valueContainsNull) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a difference from the @hvanhovell impl.? The @hvanhovell one looks simpler though.
https://github.com/apache/spark/pull/15970/files#diff-1501206e78d34b65183af1092c8ec392ce18574bb538f905ca93a22983c63ae6R558-R598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we cannot reuse the Array case? https://github.com/apache/spark/pull/31967/files#diff-1501206e78d34b65183af1092c8ec392ce18574bb538f905ca93a22983c63ae6R643
@@ -141,6 +139,27 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { | |||
val function = normalize(lv) | |||
KnownFloatingPointNormalized(ArrayTransform(expr, LambdaFunction(function, Seq(lv)))) | |||
|
|||
case _ if expr.dataType.isInstanceOf[MapType] => | |||
val MapType(kt, vt, containsNull) = expr.dataType | |||
var normalized = if (needNormalize(kt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you avoid to use var
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for this new code path in NormalizeFloatingPointNumbersSuite
?
*/ | ||
object NormalizeMapType extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
case w: Window if w.partitionSpec.exists(p => needNormalize(p)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't support BinaryComparison
cases?
|
||
override def nullSafeEval(input: Any): Any = { | ||
val childMap = input.asInstanceOf[MapData] | ||
val keys = childMap.keyArray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to sort data recursively just for nested case like map<map<int,int>,string>
and map<struct<a: map<int,int>>,string>)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that I missed this case. I'll fix it
} | ||
} | ||
|
||
case class SortMapKey(child: Expression) extends UnaryExpression with ExpectsInputTypes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SortMapKey
-> SortMapKeys
?
mapBuilder.build() | ||
} | ||
|
||
override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this PR simpler, how about leaving the codegen support into follow-up PRs just like the original PR? https://github.com/apache/spark/pull/15970/files#diff-da163d97a5f0fc534aad719c4a39eca97116f25bfc05b7d8941b342a3ed96036R423-R429
Batch("ReplaceUpdateFieldsExpression", Once, ReplaceUpdateFieldsExpression) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary change.
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove unnecessary blank lines.
* to insert an expression to sort map entries by key. | ||
* | ||
* Note that, this rule must be executed at the end of optimizer, because the optimizer may create | ||
* new joins(the subquery rewrite) and new join conditions(the join reorder). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave some comments about why this rule does not handle the Aggregate cases? https://github.com/apache/spark/pull/31967/files#diff-21f071d73070b8257ad76e6e16ec5ed38a13d1278fe94bd42546c258a69f4410R344
|
||
test("SPARK-34819: MapType has nesting complex type supports orderable semantics") { | ||
Seq(CodegenObjectFactoryMode.CODEGEN_ONLY.toString, | ||
CodegenObjectFactoryMode.NO_CODEGEN.toString).foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the two tests into SQLQueryTestSuite
? You can use the CONFIG_DIM
directive there:
https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql#L18-L20
Any update? |
@WangGuangxin If you cannot keep working on it, is it okay that I take this over? |
Sure, I'm stuck with something else, you can take this over if you have time. Thanks |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Currently MapType doesn't support orderable semantics, while it's supported in Hive/Presto. This makes it hard to migrate from Hive to SparkSQL if user have groupby/orderby map type in their sql.
Why are the changes needed?
Generally, we compare two maps by the following steps:
We have to specially handle this in grouping/join/window because Spark SQL turns grouping/join/window partition keys into binary
UnsafeRow
and compare the binary data directly instead of using MapType's ordering. In this case, we have to insert aSortMapKey
expression to sort map entry by key. This is very similiar toNormalizeFloatingNumbers
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add more UTs